REACHing for chemical safety.
نویسنده
چکیده
Background: We consider the problem of identifying the dynamic interactions in biochemical networks from noisy experimental data. Typically, approaches for solving this problem make use of an estimation algorithm such as the well-known linear Least-Squares (LS) estimation technique. We demonstrate that when time-series measurements are corrupted by white noise and/or drift noise, more accurate and reliable identification of network interactions can be achieved by employing an estimation algorithm known as Constrained Total Least Squares (CTLS). The Total Least Squares (TLS) technique is a generalised least squares method to solve an overdetermined set of equations whose coefficients are noisy. The CTLS is a natural extension of TLS to the case where the noise components of the coefficients are correlated, as is usually the case with timeseries measurements of concentrations and expression profiles in gene networks. Results: The superior performance of the CTLS method in identifying network interactions is demonstrated on three examples: a genetic network containing four genes, a network describing p53 activity and mdm2 messenger RNA interactions, and a recently proposed kinetic model for interleukin (IL)-6 and (IL)-12b messenger RNA expression as a function of ATF3 and NF-κB promoter binding. For the first example, the CTLS significantly reduces the errors in the estimation of the Jacobian for the gene network. For the second, the CTLS reduces the errors from the measurements that are corrupted by white noise and the effect of neglected kinetics. For the third, it allows the correct identification, from noisy data, of the negative regulation of (IL)-6 and (IL)-12b by ATF3. Conclusion: The significant improvements in performance demonstrated by the CTLS method under the wide range of conditions tested here, including different levels and types of measurement noise and different numbers of data points, suggests that its application will enable more accurate and reliable identification and modelling of biochemical networks. Published: 10 January 2007 BMC Bioinformatics 2007, 8:8 doi:10.1186/1471-2105-8-8 Received: 25 October 2006 Accepted: 10 January 2007 This article is available from: http://www.biomedcentral.com/1471-2105/8/8 © 2007 Kim et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 15 (page number not for citation purposes) BMC Bioinformatics 2007, 8:8 http://www.biomedcentral.com/1471-2105/8/8 Background A key objective of Systems Biology research is to move from a qualitative to a quantitative understanding of cellular signalling and gene networks. Motivated by recent advances in high-throughput genomics and proteomics analysis, and the resulting explosive growth in the amount of data available for analysis, much effort is currently focused on developing reliable methods for inferring the structural and functional organisation of biochemical networks from data obtained by time-series measurements – see for example [1-6] and references therein. Interactions between components of biological networks can conveniently be represented by weighted, directed graphs, where the nodes correspond to the biochemical components, and the edges, represented as arrows with weights attached, indicate the direct quantitative effect that a change in one component has on another component [1]. The weights are, in general, nonlinear functions that represent often largely unknown reaction kinetics, and it is therefore not usually practical to directly determine these weights from experimental data. This is particularly the case for gene networks whose structures are poorly understood in general, even qualitatively. In such cases, a useful approach is to consider the biochemical network behaviour about some steady-state, and assume that it behaves linearly for small deviations from this steady-state [2,3]. With this assumption, the network weights become constants, quantifying the reactions between the components in the neighbourhood of the steady-state. An interaction matrix, known as the Jacobian, is then obtained by grouping the constant weights into a matrix. Several different approaches for determining the Jacobian of a network from time-series data have recently appeared in the literature [1-6]. A common feature of all these approaches is that the network is perturbed in some way, and then data are collected from time-series measurements of one or more components of the network. In [1], an approach was proposed which can handle very general types of system perturbations, such as gene knockouts and inhibitor additions. For these types of perturbations, the exact size, as well as the direct effect of the perturbations will be largely unknown, and therefore the method also allows the determination of the perturbation itself from the data. Another advantage of the approach of [1] is that the effect of unsteady-state initial conditions can be treated as an unknown perturbation and hence also estimated from the data. This removes the requirement for the system to be in a steady-state with known activities and concentrations when the perturbation is applied. Another common feature of almost all the approaches for reverse engineering biomolecular networks so far proposed in the literature is that they employ some estimation algorithm to infer network structure from the measurement data. In [1], for example, the basis of the method for simultaneous estimation of the system states and parameter perturbations is a linear least-squares algorithm. A significant limitation of most such algorithms is that they do not take account of the noise that is inevitably present in the measurement data. Indeed, in the results presented in [1], it was observed that significant levels of noise in the measurement data could lead to quite large errors in the estimated Jacobian matrix. In data from most biological experiments, the error associated with each measurement is substantial. The amount of measurement noise is often poorly defined but arises from 1) errors inherent in the measurement technique; 2) errors in the time a measurement is made (with absolute and drift components); and 3) biological variation in the behaviour of cells or organisms in the assay. Inaccuracy in measurements, leading to noise in the data available for analysis, can, in theory, be addressed by improvement of techniques and by replication. In practice, however, improving measurement quality or increasing replication is often not possible because it can involve slower sampling or result in the inclusion of more biological variation (e.g. through adding parallel cultures, or repeating experiments on different days). Therefore, it is critical to develop analytical approaches which allow robust identification of interactions in biochemical networks from data with a substantial, but poorly-defined, noise component. Such approaches are also valuable in reverse, i.e. in suggesting how experimental sampling strategies can be improved to provide optimal data in terms of both number and accuracy of data points. Given the ubiquity of measurement noise in biological data, there is clearly a need for advanced estimation algorithms which can explicitly, and in some sense optimally, take such noise into account when producing estimates of the network interactions. In this paper, we consider two such extensions of the classical Least Squares (LS) algorithm, namely the Total Least Squares (TLS) [7,8], and the Constrained Total Least Squares (CTLS) [9,10] algorithm. The CTLS algorithm, in particular, is shown to be ideally suited to the problem of accurately and reliably identifying functional interactions between network components from noisy data. While both of these algorithms are now routinely used in advanced signal and image processing applications, we believe that this is the first time that their usefulness in Systems Biology has been highlighted. Results and Discussion In this section, the performance of the three algorithms described above is tested on an in silico four-gene network example, on a high fidelity in silico p53 and mdm2 interPage 2 of 15 (page number not for citation purposes) BMC Bioinformatics 2007, 8:8 http://www.biomedcentral.com/1471-2105/8/8 action model and on an example of interleukin (IL)-6 and IL12b interactions with activating transcription factor 3 (ATF3) and Rel (a component of NF-κB) based on in vivo data. All computations were performed on a 3.06 GHz Pentium IV machine with 1.00 GB of RAM using Windows XP Professional, MATLAB 7.2, and the MATLAB Optimisation Toolbox Version 3.0.4. A Four-Gene Network Model A four-gene network example is presented in the supplementary material of [2]. This network was used as a testbed to evaluate the performance of network identification approaches in both [1] and [2]. The differential equations for the gene network are given by where xi(t) is the concentration of mRNAi, for i = 1, 2, 3, 4, the first term and the second term on the right hand side of the equations represent the rate of transcription and the rate of degradation of each mRNA, respectively, and each maximal enzyme rate is given by = 5, = 3.5, = 3, = 4, = 200, = 500, = 150, = 500, with units of nM · h-1. The Michaelis constants are given by K14a = 1.6, K24a = 1.6, K32a = 1.5, K43a = 0.15, K12i = 0.5, K31i = 0.7, K1d = 30, K2d = 60, K3d = 10, K4d = 50, in units of nM, and A14 = 4, A24 = 4, A32 = 5, A43 = 2, n12 = 1, n14 = 2, n24 = 2, n31 = 1, n32 = 2, n43 = 2. In the model, gene interactions result in nonlinear dependencies of transcription rates on other mRNA concentrations, which act as communicating intermediaries. The corresponding gene network for this example is shown in Figure 1. For this example, the level of perturbation for for i = 1, 2, 3, 4 from the nominal values is 100% and the measurement noise is assumed to be zero-mean white gaussian with variance equal to the square of the equilibrium times 0.02, where the equilibrium states are given by = 0.4920, = 0.6052, = 0.1866, and = 0.6514. The number of experiments is four. In each experiment is perturbed in the negative direction, i.e. inhibited, and the data sampling time is 0.01 h (36s). The true (simulated) values of xi(t) together with the noisy measurements for this example are shown in Figure 2. The three different least squares algorithms are tested for different numbers of data points per experiment, i.e., 3, 6, 9, 12, 21, 30, 60, and the quality of the Jacobian estimations was evaluated according to a number of different definitions of estimation error, which are discussed in the Methods section. The results generated from 1000 MonteCarlo simulations are given in Table 1. Note that the estimation errors of the TLS for the cases of very few data points, i.e. 6, 9, and 12 are larger than the errors from the standard least squares algorithm. This is because, as discussed later in the Methods section, the TLS algorithm requires a minimum number of data points to work properly. For the case of only 3 data points, all three algorithms provide the same result, since in this case the set of equations to be solved is not over-determined, i.e. there is a single unique solution. Excluding this case, the CTLS reduces the mean of the relative magnitude error for each element of the Jacobian, i.e. εM, by an average of 27% compared with the standard least squares technique, over all the different cases considered. This improvement rises to 37% when the four cases with the fewest data points are removed. The variance of the error is reduced by an average of 25.6%, excluding the first three cases. For the sign estimation error, εS, all three methods give a similar level of performance – the reason for this is easy to see, however, by considering the true Jacobian of the network: Clearly, the Jacobian contains no terms which are very close to zero and therefore the signs of the estimates for each term will be very similar for all three methods. The CTLS almost always gives the best performance in the root mean square sense. A common feature of the results presented in Table 1 is that the accuracy of the estimate improves with increasing numbers of data points. However, beyond a certain critical number of data points, there is no further improvement in the quality of the estimate x t V A x t K x t K x t s a n
منابع مشابه
Assessment of an RC existing hospital building with special moment frame using fragility curve
In this study, an existing hospital structure has been evaluated with incremental dynamic analysis (IDA). This building is accommodated in Karaj with soil type II. At first, two 2-D frames along X, Y direction are selected. Then, five performance levels are determined according to “Rehabilitation Code for Existing Buildings (Publication No. 360) including Immediate Operational (IO), Limited Dam...
متن کاملبررسی مولفههای فرهنگ ایمنی در صنعت هواپیمایی
Background and aims: efficient safety is not only need to establish appropriate organizational structure, rules emotion and procedures but also need to real commitment of top manager. We can find Primary organizational commitment signals about organizational safety policies in Safety Culture. Safety Culture is important factors that can cause every employer have an important role in organizat...
متن کاملSafety first: Instrumentality for reaching safety determines attention allocation under threat.
Theories of attention to emotional information suggest that attentional processes prioritize threatening information. In this article, we suggest that attention will prioritize the events that are most instrumental to a goal in any given context, which in threatening situations is typically reaching safety. To test our hypotheses, we used an attentional cueing paradigm that contained cues signa...
متن کاملSafety and efficacy of entecavir for the treatment of chronic hepatitis B
Entecavir is a cyclopentyl deoxyguanosine analog that was approved for the treatment of the hepatitis B virus (HBV) in 2005. In Phase III trials, it showed potent HBV suppression with drops of 6- to 7-log copies/mL in HBV DNA at 1 year. In addition, rates of genotypic resistance in nucleos(t)ide-naïve patients are low, reaching only 1.2% after 6 years. Safety and efficacy have been established ...
متن کاملToxic Chemical Release Hazard Distance Determination Using Chemical Exposure Index (CEI) in a Gas Refinery
Events leading up to the release of toxic chemicals in the processing plants are one of the main hazards of chemical industries that can endanger employees and also people in neighborhood. In this study, DOW's Chemical Exposure Index (CEI) is used to determine hazard distances of possible toxic chemical releases in one of the South Pars gas refineries. To...
متن کاملComprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques
Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Environmental Health Perspectives
دوره 111 شماره
صفحات -
تاریخ انتشار 2003